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ON  OPTIMAL  CONTROL  OF  A  BROWNIAN  MOTION 


by 

Yu-Chung  Liao 


Abstract 


-Gonoidcr^a  controlled  diffusion  process  which  evolves  as  a  reflected 
Brownian  motion  under  each  control  action.  A  switching  cost  is  incurred 
when  the  control  action  is  switched.  The  control  problem  turns  out  to  be 
a  sequential  decision  problem,  i.e.,  to  find  a  sequence  of  optimal  stopping 
times  to  switch  control.  The  dynamic  programming  equation  for  a  discounted 
cost  criterion  is  a  quasi-variational  inequality.  By  allowing  the  discount 
factors  tend  to  zero,  we  show  a  new  Q.V.I.  has  a  solution  that  serves  as  a 
potential  function  to  give  direction  to  attain  the  optimality  for  a  long-run 
average  cost  criterion. 


Key  words:  diffusion,  switching  cost,  quasi-variational  inequality,  { 

potential  function,  long-run  average  cost. 
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1 .  Introduction 


Optimal  control  of  reflected  Brownian  motion  arises  naturally  from 
input-output  systems.  Faddy  [4]  models  a  dam  by  a  Brownian  motion  with  two 
reflecting  barriers.  Puterman  [9]  uses  diffusion  processes  to  model  production 
and  inventory  processes.  In  both  cases  they  assume  the  existence  of  a 
stationary  optimal  strategy  and  start  from  there.  In  Rath  [10]  a  Bang-Bang 
style  strategy  is  proved  to  be  optimal  among  stationary  strategies  by  using  a 
random  walk  to  approximate  Brownian  motion.  Chemoff  and  Petkau  [2]  prove 
that  the  optimal  conditions  are  satisfied  by  certain  strategies.  All  those 
papers  discuss  the  case  of  linear  holding  costs  and  two  control  actions. 

Here,  we  consider  a  controlled  diffusion  process  which  evolves  as  a 
reflected  Brownian  motion.  A  switching  cost  is  incurred  when  the  control  action 
is  switched.  Since  the  instants  of  switches  are  crucial,  the  optimal  control 
problem  turns  out  to  be  a  sequential  decision  problem.  We  can  write  the 
dynamic  programming  equation  for  a  discounted  cost  criterion  by  the  principle 
of  dynamic  programming  in  Fleming-Rishel  [5].  It  is  a  quasi -variational  in¬ 
equality  which  can  be  solved  by  the  penalty  method  in  Bemroussan- Lions  [1], 

By  allowing  the  discount  factors  tend  to  zero,  a  new  quasi -variational  inequality 
arises  as  the  dynamic  programming  equation  for  a  long-run  average  cost  criterion. 
We  solve  it  to  prove  the  existence  of  a  stationary  optimal  strategy. 

2.  Model 

Let  (n,F,P)  be  a  probability  space  on  which  a  standard  Brownian  motion 
Wt  is  defined,  is  the  increasing  family  of  complete  o-fields  generated 
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by  W^.  Let  S  be  the  set  of  F^-stopping  times.  Let  A  =  {1,2,. ..M)  be 
the  set  of  control  actions.  Under  control  action  i,  the  controlled  process 
evolves  as  the  reflected  Brownian  motion 

(1)  Rf(x+dit+aiWt) 

where  Rf  is  a  function  on  C[0,®)  defined  as 

Rf(w)(t)  =  w(t)  -  inf{0;  w(s),  s  <  t} 

for  all  w  €  C[0,«).  The  operating  and  holding  cost  is  ffx.i)  per  unit 
time  if  the  state  of  the  process  is  x  and  action  i  is  used.  When  switch¬ 
ing  from  action  i  to  j  a  switching  cost  C{i,j)  is  incurred.  Since 
infinitely  many  switches  in  a  finite  time  interval  will  make  the  total  cost 

00 

blow  up,  we  can,  without  loss  of  generality,  define  a  strategy  u  =  (s(n) ,u(n) )n_j 
to  be  admissible  if 

i)  s(n)  €  S  for  all  n, 

ii)  0  <_  s(n)  <  s(n+l)  for  all  n 

iii)  s(n)  -*■  «  as  n  -*■  »  w.p.l., 

iv)  u(n):0-*-A  is  Fs ^ -measurable  and  u(n)  t  u(n+l)  for  all  n. 

Then  given  initial  state  (x,i)  and  admissible  strategy  u  the  control  process 

t  <  s(l), 
s(n)  <  t  <  s(n*l) 


u(t) 


■f 

lu(n) 


and  the  controlled  process  is 


Rftx10du(s)ds+J0au(s)dWs^ 

We  assume  the  following  conditions  throughout  this  paper. 


(2) 

a.  /  0 

l 

i  €  A, 

(3) 

d.  <  0 

X 

i  €  A, 

(4) 

:R+  -*•  R+  is  bounded  measurable  and  nondecreasing 

i  €  A, 

(5) 

>  0  and  C(i,j)  +  C(j,l)  >  C(i,l)  i  +  j  and 

j  t  1. 

Here,  (3)  is  a  stability  condition.  See  Kushner  [7]  and  [8]. 


3.  Preliminary  Results. 

To  use  variational  inequality  techniques  for  solving  sequential  decision 
problems  has  been  studied  extensively  in  [1].  For  completeness  we  briefly  dis¬ 
cuss  some  results  in  a  form  which  is  suited  for  use  in  the  next  section. 

Let  G  be  an  open  subset  of  R+,  y  >_  0,  p  >  1  and  D  =*  We 

denote  the  space  of  all  functions  f  on  G  such  that 

l  ||e"wxDkf(x)||  <  - 
k*0  LP(G) 

by  Wn,p,w(G)  and  Wn,p(G)  if  y  -  0.  Since  the  generalized  ltd's  formula  in 
[1]  holds  for  the  diffusion  processes  with  reflecting  boundary  in  Stroock- 
Varadhan  [13],  the  following  theorem  is  proved  by  the  penalty  method  in  [1]. 

We  state  it  without  proof. 
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Here,  x(t)  is  the  process  in  (1),  and  •*  =  inf{t:x(t)  =  B), 

Corollary  1.  Given  h  £  W*’°°(G)  and  satisfy 

there  is  a  constant  c  and  a  sequence  such 

2 

(10)  that  h  satisfies  (6) ,  D  h  <  c  for  all  n 

n  n 

1  oo 

and  h  ■+■  h  as  n  -*■  »  in  W  ’  (G) . 
n 

2  + 

Then  Theorem  1  holds.  In  this  case,  || (D  h)  ||g  in  (8)  is  replaced  by 
C*  =  Max{0,c). 

Proof.  Let  Zr  be  the  solution  of  (7)  with  respect  to  hn.  By  (8)  there  is 
a  subsequence  of  n,  still  denoted  by  n,  such  that  Zn  -*•  Z  weakly  in 
W^  (G)  and  strongly  in  W1,P(G).  Hence  Z  satisfies  (7),  (8)  and  (9). 

We  have  the  same  conclusions  as  in  Corollary  1  when  the  Neumann  boundary 
condition  (7e)  is  replaced  by  the  Dirichlet  boundary  condition. 

Corollary  2.  Let  G  =  [Bj.B^]  and  Bj  >  0.  Then  Corollary  1  is  true  if  (7e) 

and  (7f)  are  replaced  by 

(e')  Z(Bi)  =  Vj  and  v±  <_  hCBi)  i  *  1,2. 

In  case  xr | v^ |  £  ||f|^,  for  i  =  1,2  then  |v|  in  (8)  can  be  removed, 

i.e. 

(11)  l|LlZ||<  2Hf|k-d.||Dh||G*±.?C'. 

Another  result  from  Robin  [11]  is 

Theorem  2.  Given  f  €  L**(R+) ,  r  >  0,  h  bounded  continuous  on  R+.  Let  x(t) 
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be  the  process  in  (1)  and 

(12)  Z(x)  =  inf  E  JfVrtf(x(t))dt  +  e'rsh(x(s))\. 

s€S  XU0  J 

Then  Z  is  bounded  continuous  on  R+,  Z  £  h  and 

Z(x)  =  E  fTe”rtf (x(t))dt  +  e"rTh(x(T)) 

0 

where 

t  =  inf{t:Z(x(t))  =  h(x(t))}. 

The  next  lemma  gives  useful  estimates. 

Lemma  1.  If  both  f  and  h  in  Theorem  2  are  nondecreasing  and  non-negative 
then  so  is  Z. 

Proof.  Given  x  >  y  then 

Rf (x+d^t+a^W^)  >_  Rf (y+d^t+aJ^)  w.p.l. 

By  (12),  it  is  clear  that  Z(x)  >_  Z(y)  0. 

Following  the  assumptions  of  Lemma  1  we  have 

0  <  Z(x)  -  Z(y)  <  Ex||Vrtf(x(t))dt  +  e_TtZ(y)J  -  Z(y) 

where 


t  «  inf{t:x(t)  *  y}. 


By  Karlin-Taylor  {6] 


Mow  M^Cx)  is  satisfied  by  (10)  when  VjJ(x.j)  €  W2'*(r*)  for  ell  j  €  A 
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By  induction,  we  can  choose  boundary  condition  v  =  V^(B)  and  use  Corollary 
1  to  show  locally,  hence  globally,  that 

(a)  V*(x,i)  6  W2’"(R+)  i  £  A  and  m  >  0, 

(b)  V^(x,i)  -  M.V*  ^(x)  <_  0  i  €  A  and  m  ^  1, 

(15)  (c)  l/vr(x,i)  +  rVr(x,i)  -  f(x,i)  <  0  a.e.  on  R+  i  £  A  and  m  >  0, 

in  in  —  — 


(d)  (b)x(c)  =  0, 

(e)  DV*(0,i)  =0  i £  A. 


Here  (a)  comes  from  (11)  and  (14)  because  the  upper  bound  in  (11)  is  actually 
independent  of  G.  Let 

Vr(x,i)  =  inf  E“  .{f e'rtf(x(t),u(t))dt  +  J  e'rs(n)C(u(n-l)  ,u(n))l 
u£U  x,1Uo  n=l  J 


where  U  is  the  set  of  all  admissible  strategies  and  C(u(0),  u(l))  =  C(i,u(l)) 

Theorem  3.  Vr(x,i)  is  the  unique  solution  of  the  following  quasi -variational 

inequality 

(a) 

Vr(x.i)  €  W2  ”(R+)  i  €  A, 

(b) 

L*Vr(x,i)  +  rVr(x,i)  -  f(x,i)  _<  0  a.e.  i  €  A 

(16)  (c) 

Vr(x,i)  -  M.Vr(x)  <0  i  £  A, 

(d) 

(b)x(c)  =  0, 

(e) 

DV(0,i)  *  0  i  €  A. 

i  £  A 
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x* 

Also  V  (x,i)  is  non-decreasing  for  all  i  and  there  is  a  constant  K 
independent  of  r  such  that 

(17)  ||  DVr(x,i)  jj  <  K 

R  ~ 

and 

(18)  ||D2Vr(x,i)||  <  K  for  all  i  £  A. 

R 

Proof.  The  same  as  in  Evans-Freidman  [3]:  V^(x,i)  is  the  optimal  cost 

function  to  control  the  process  with  no  more  than  m  switches, 

(19)  lK(*.i)||  +  <l|f(x.i)||  +  i£A 

R  R 

and 

(20)  V*(x,i)  -+  Vr(x,i)  as  m  -►  ® 

+  2  r  + 

uniformly  on  R  .  Now  all  we  need  is  to  estimate  D  Vm(y,i)  for  any  y  £  R  , 

i  €  A  and  m  ^  2.  If 

V>’»  <  Vm-l^’ 

then  there  is  a  neighborhood  G  of  y  such  that 

(21)  I/VVx.i)  +  rV^(x.i)  -  f(x,i)  =  0  a.e.  on  G. 

m  m 

By  (13)  and  (19),  we  have 

(22)  ||DV(x,i)||G  <  6a72||f(x,i)||  +. 

R 

& 


there  is  a  set  A'  c  A  such  that  i  i  A', 

V*(y,i)  =  C(i.j)  ♦  V^_j(y,j)  j  6  A* 
and 


V*(y,i)  <  C(i,j)  +  Vj  ^y.j)  j  t  A*  and  i  +  j. 


By  (5) ,  we  have 

vJ.jO'.J)  «  JCA'. 

So  there  is  a  neighborhood  G  of  y  on  which  V*_j,(x,j)  satisfies  (21), 
hence  (22),  for  all  j  €  A'.  Thus, 

(23)  V*(x,i)  *  inf  Ex||^  e"rtf(x(t))dt  +  e'rSI{T>s}h(x(s)) 

••'"'(./.'■W’1'!  ->  g 

where 


h(x)  *  min  C(i,j)  ♦  V*  ,(x,j)  on  G 
j€A' 

t  «  inf{t:x(t)  €  3G  and  x(t)  >  0). 


By  Corollary  2,  (11)  and  (22),  we  have 
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(1.24)  ||LV(x,i)||G  <  6||f(x,i)||  +. 

R 

From  (14) ,  (22)  and  (24)  there  is  a  constant  K  independent  of  r  and  m 
such  that 

l|I"£(*.i)||  + 

R 

and 

llDV(x,i)||  +  <  K. 

R 

So  the  theorem  is  proved  by  allowing  m  ->•  <*>  in  (15). 


4.  Minimum  Average  Cost  Problem. 

The  total  cost  to  control  the  process  by  strategy  u  up  to  time  T 
with  initial  state  (x,i)  is 


»T  ® 

J(u,x,i,T)  *  E“ij^f(x(t),u(t))dt  +  l  I{s(n)<T}C(u(n-l) ,u(n)) , 


The  long  run  average  cost  is 

(25)  6(u,x,i)  *  lim  inf  . 

The  related  dynamic  programming  equation  to  minimize  e(u,x,i)  is  solved  as 
Theorem  4.  Theorem  5  is  a  verification  theorem  that  shows  that  there  is  t» 
stationary  optimal  strategy  such  that  the  minimum  average  cost  can  be  attained 
as  a  real  limit  in  (25). 
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By  (19)  and  (20),  there  are  j  €  A,  e  €  R  and  a  subsequence  of  r, 
still  denoted  by  r,  such  that 

Vr(0,j)  <  Vr(0,i)  i  €  A 

and 

rVr(0,j)  -v  0  as  r  -v  0. 

By  Theorem  3,  we  have 


0  <  V^x.i)  =  Vr(x,i)  -  Vr(0,j) 

<  C(i,j)  +  VT(x, j)  -  Vr(0,j) 

<  C(i,j)  +  Kx, 

Since  V^x.i)  has  the  same  derivatives  as  Vr(x,i)  has,  we  have 


(a) 

V'ix.i)  € 

W2'"(R+)  i 

€  A, 

(b) 

LV(x,i) 

+  rVr(x,i)  - 

f(x,i)  < 

(c) 

^(x.i)  - 

Mj/* (x)  <  0 

i  €  A, 

(d) 

(b)  *(c)  = 

0, 

(e) 

DV^O.i)  . 

*  0  i  €  A. 

By  (17),  there  is  a  function  V(x,i)  and  a  subsequence  of  r,  still  denoted 
by  r,  such  that 

^(x.i)  -  V(x,i) 


and 


rV  (x,i)  -*■  0 


uniformly  on  compact  subsets  of  R  as  r  -*■  0  for  all  i. 


1 


Theorem  4.  V(x,i)  satisfies 

(a)  V(x,i)  £  W2’p’y(R+)  p  >  1,  v  >  0, 

(b)  L*V(x,i)  +  0  -  f(x,i)  <_  0  a.e.  on  R+  i  £  A, 

(c)  V(x,i)  -  M.V(x)  <  0, 

(d)  (b)x(c)  =  0, 

(e)  DV(0,i)  =0  i  €  A 
and 

(27)  0  <  V(x,i)  <  C(i,j)  ♦  Kx. 

Proof.  Let  r  0  in  (26) . 

Let  g  be  a  twice  continuous  differentiable  function  on  R+  such  that 

g(x)  >0  on  R+, 

Dg(0)  =  0, 

g(x)  =  eaX  x  >  B 

and 

g(x)  +  | Dg (x)  1  ♦  |D2g(x)|  <  K’  x  £B 
for  some  constants  B  and  K' . 

Le— a  1.2.  For  any  u,x  and  i,  ,x(T)  is  a  bounded  function  of  T. 
—————  x,i 

Proof.  By  (3),  there  is  an  o  >  0  and  B  <  0  such  tha" 

■|a2a2  ♦  djo  <6  i  €  A. 

Let  u  ■  (s(n),u(n)}^1  with  s(m)  »  »  for  some  ■  >  0.  By  Stroock 


Varadhan  [21] 
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h(T)  i  E"  ,g(x(T)) 

A  ,  1 


-  g(x)  ♦  E“  J  l  f  (  5  -LU(t)g(x(t))dt} 
31,1  ni»l  JlAs(n-l)  J 

=  g(x)  ♦  E“  .  f  -Lu(t)g(x(t))dt 
x>i  Jn 


is  finite  for  all  T.  So 


dh(T)  =  e“  .  -  Lu(T)g(x(T) 

X,  1 


Ex,i{I{x(T)>B)Ci  aS(T)“2  *  du(T)a)e 


ox(T) 


*ucn&-  LU(T)*(x(t))}* 


Hence  Dh(T)  <0  if 


h(T)  >  K"  >  K' 


iE* 


for  sone  constant  K" .  Thus 


h(T)  £  K"  +  g(x)  T  >  0. 


For  any  admissible  strategy  u,  we  have 

Ex,i,tx(T))  *  ljz  E*yt,(.)£n‘(x(T)} 

<  K"  ♦  J(x). 


This  proves  the  1 


»****?! 
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Let  u*  s  {s(n),u(n)}*Bl  be  defined  by 

s(l)  -  inf{t>0:V(x(t),i)  -  M.V(x(t))}, 

liCl)  =  min{j€A:V(x(s(l)),i)  -  C(i,j)  ♦  V(x(s(l)) , j)} 

and 

s(n)  -  inf{t>s(n-l):V(x(t),u(n-l))  *  Mu(nl)V(x(t))}, 

u(n)  -  oin{j€A:V(x(s(n)),u(n-l))  -  C(u(n-l),j)  ♦  V(x(s(n)),j)} 

for  n  >  1. 


Theorem  5.  e(u*,x,i)  *  e  <_  e(u,x,i)  for  any  admissible  strategy  u. 


Proof.  By  Theorem  4, 


E^mxm.ncr))  -  v(x,i)> 

*  00 

(27)  »  E“  l  {V(x(T  s(n)),unAs(n))  -  V(x(TAs(n-l)),u(TAs(n-l)))} 

x,x  n*l 

••  TAs  (n) 

■  E“’‘  j.{L(L)LUtn'I)vtx(t,’u<"'i))dt  ■  iuw«T)c<u("*i,'uw)}' 


where  u(1jAs(0))  ■  i.  Hence 

(28)  0  • 


{V  (x,  i)  -E“  ^  jV  (x(T)  ,u(T)  )  ) 

i— 


and  then 


(29)  6  -e(u*,x,i) 

by  l»—  2.  To  prove  e  £  0(u,x,i)  for  any  u,  we  siaply  have  inequality  at 
(27).  (28)  and  (29). 


u*  is  a  stationary  stratety. 
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